深度卷积神经网络(DCNNS)在面部识别方面已经达到了人类水平的准确性(Phillips等,2018),尽管目前尚不清楚它们如何准确地区分高度相似的面孔。在这里,人类和DCNN执行了包括相同双胞胎在内的具有挑战性的面貌匹配任务。参与者(n = 87)查看了三种类型的面孔图像:同一身份,普通冒名顶替对(来自相似人口组的不同身份)和双胞胎冒名顶替对(相同的双胞胎兄弟姐妹)。任务是确定对是同一个人还是不同的人。身份比较在三个观点区分条件下进行了测试:额叶至额叶,额叶至45度,额叶为90度。在每个观点 - 差异条件下评估了从双胞胎突变器和一般冒险者区分匹配的身份对的准确性。人类对于一般撞击对比双重射手对更准确,准确性下降,一对图像之间的观点差异增加。通过介绍给人类的同一图像对测试了经过训练的面部识别的DCNN(Ranjan等,2018)。机器性能反映了人类准确性的模式,但除了一种条件以外,所有人的性能都处于或尤其是所有人的表现。在所有图像对类型中,比较了人与机器的相似性得分。该项目级别的分析表明,在九种图像对类型中的六种中,人类和机器的相似性等级显着相关[范围r = 0.38至r = 0.63],这表明人类对面部相似性的感知和DCNN之间的一般协议。这些发现还有助于我们理解DCNN的表现,以区分高度介绍面孔,表明DCNN在人类或以上的水平上表现出色,并暗示了人类和DCNN使用的特征之间的均等程度。
translated by 谷歌翻译
神经表示是表示形状的流行,因为它们可以学习形式传感器数据,并用于数据清理,模型完成,形状编辑和形状合成。当前的神经表示形式可以归类为对单个对象实例的过度拟合或表示对象集合。但是,都不允许对神经场景表示的准确编辑:一方面,过度拟合对象实现高度准确的重建的方法,但不能推广到看不见的对象配置,因此无法支持编辑;另一方面,代表具有变化的对象家族的方法确实概括了,但仅产生近似重建。我们建议Neuform使用最适合每个形状区域的一个:可靠数据的过拟合表示,以及可靠的可用数据以及其他任何地方的可推广表示形式,以适应过度拟合和可推广表示的优势。我们通过精心设计的体系结构和一种将两个表示网络权重融合在一起的方法,避免接缝和其他工件。我们展示了成功重新配置人类设计的形状的部分,例如椅子,表和灯,同时保留语义完整性和过度拟合形状表示的准确性。我们与两个最先进的竞争对手进行了比较,并在合理性和结果编辑的忠诚度方面取得了明显的改善。
translated by 谷歌翻译
数据增强是自然语言处理(NLP)模型的鲁棒性评估的重要组成部分,以及增强他们培训的数据的多样性。在本文中,我们呈现NL-Cogmenter,这是一种新的参与式Python的自然语言增强框架,它支持创建两个转换(对数据的修改)和过滤器(根据特定功能的数据拆分)。我们描述了框架和初始的117个变换和23个过滤器,用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构,Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用(\ url {https://github.com/gem-benchmark/nl-augmenter})。
translated by 谷歌翻译
我们表明,降噪扩散Probabalistic模型(DDPM),一类基于分数的生成模型,可用于制作逼真的假尚图像星系的模拟观测。我们的方法与从河外调查(探针)样品从斯隆数字巡天选择的测光和旋转曲线的观察和星系星系暗能量光谱仪器GRZ成像测试。主观上,当与来自真正的数据集中样品相比所产生的星系高度逼真。我们从深生成学习文学借款,使用'神父\“echet盗梦空间距离”,以测试主观和形态相似性量化的相似性。我们还引进了`合成银河的距离”这一指标来比较新兴的物理性质(如总大小,颜色和半光半径)地面实况父母和子女合成数据集。我们认为,DDPM方法产生比其它生成方法如对抗性网络(与更昂贵的推理的下侧)更清晰,更逼真的图像,并且可以用于产生适合于特定的成像调查合成的观察大样本。我们证明了DDPM的两个潜在的用途:(1)在准确喷漆遮蔽数据,如卫星路径,和(2)域转移,其中新的输入图像可以被处理以模仿DDPM训练集的属性。在这里,我们`DESI-FY”卡通形象为理念的域转移的证明。最后,我们建议适用于可在天文学界内有关这个主题的激励进一步的研究基于分数的办法的潜在应用。
translated by 谷歌翻译
We demonstrate a proof-of-concept of a large language model conducting corporate lobbying related activities. We use an autoregressive large language model (OpenAI's text-davinci-003) to determine if proposed U.S. Congressional bills are relevant to specific public companies and provide explanations and confidence levels. For the bills the model deems as relevant, the model drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation. We use hundreds of ground-truth labels of the relevance of a bill to a company to benchmark the performance of the model, which outperforms the baseline of predicting the most common outcome of irrelevance. However, we test the ability to determine the relevance of a bill with the previous OpenAI GPT-3 model (text-davinci-002), which was state-of-the-art on many language tasks until text-davinci-003 was released on November 28, 2022. The performance of text-davinci-002 is worse than simply always predicting that a bill is irrelevant to a company. These results suggest that, as large language models continue to improve core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. We then discuss why this could be problematic for societal-AI alignment.
translated by 谷歌翻译
In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
translated by 谷歌翻译
In large-scale machine learning, recent works have studied the effects of compressing gradients in stochastic optimization in order to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in large-scale, multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? In this paper, we investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our main technical contribution is to show that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. We then extend our results significantly to nonlinear stochastic approximation algorithms and multi-agent settings. In particular, we prove that for multi-agent TD learning, one can achieve linear convergence speedups in the number of agents while communicating just $\tilde{O}(1)$ bits per agent at each time step. Our work is the first to provide finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our analysis hinges on studying the drift of a novel Lyapunov function that captures the dynamics of a memory variable introduced by error feedback.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
Research on automated essay scoring has become increasing important because it serves as a method for evaluating students' written-responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods than can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern automated essay scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included as part of the Automated Student Assessment Prize competition that were then classified using a scoring model that was training with the bidirectional encoder representations from transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
translated by 谷歌翻译
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively.
translated by 谷歌翻译